ObservationGenerator#
The obversation_generator
module provides classes for generating and managing observations during the beam search process in the LMCSC (Language Model-based Corrector with Semantic Constraints) system.
Key Components#
BaseObversationGenerator
: Abstract base class for observation generators.NextObversationGenerator
: Concrete implementation of an observation generator.
BaseObversationGenerator#
The BaseObversationGenerator
class serves as an abstract base class for observation generators. It defines the interface that all observation generators should implement.
Key Methods:#
reorder
: Reorders the beams based on given indices.step
: Performs a step in the beam search process.show_steps
: Displays the steps taken in the beam search process.get_observed_sequences
: Retrieves the observed sequences from the beam search process.
NextObversationGenerator#
The NextObversationGenerator
class is a concrete implementation of BaseObversationGenerator
. It records the progress of the beam search, tracking what has been generated so far and what characters are yet to be generated.
Key Features:#
Supports both string and byte-level operations
Tracks predictions, steps, and completion status for each beam
Provides verbose mode for detailed step tracking
Handles reordering of beams during search
Generates observed sequences based on the current state of the search
API Documentation#
- class lmcsc.obversation_generator.NextObversationGenerator(src, n_beam, n_observed_chars, is_bytes_level, verbose=False)[source]#
Bases:
BaseObversationGenerator
This class records the progress of the beam search, tracking what has been generated so far and what characters are yet to be generated.
- Parameters:
src (List[str]) – The source sequences.
n_beam (int) – The number of beams for beam search.
n_observed_chars (int) – The number of characters to observe.
is_bytes_level (bool) – Whether to operate at the byte level.
verbose (bool, optional, defaults to False) – Whether to enable verbose mode.
- src#
The source sequences, potentially encoded to bytes.
- Type:
List[Union[str, bytes]]
- n_beam#
The number of beams.
- Type:
int
- n_observed_chars#
The number of characters to observe.
- Type:
int
- is_bytes_level#
Whether operating at byte level.
- Type:
bool
- verbose#
Verbose mode flag.
- Type:
bool
- batch_predicts#
Predictions for each beam in each batch.
- Type:
List[List[Union[str, bytes]]]
- batch_steps#
Steps taken for each beam in each batch.
- Type:
List[List[int]]
- batch_verbose_steps#
Verbose steps for each beam in each batch.
- Type:
List[List[List[Union[str, bytes]]]]
- is_finished#
Flags indicating if each beam in each batch is finished.
- Type:
List[List[bool]]
- reorder(beam_idx: List[int]) None [source]#
Reorders the beams based on the given indices.
- Parameters:
beam_idx (List[int]) – The indices to reorder the beams.